Extraction of Visual Features for Lipreading
نویسندگان
چکیده
The multi-modal nature of speech is often ignored in human-computer interaction but lip deformation, and other body such as head and arm motion all convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. This paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three methods for parameterising lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape, or shape and appearance respectively. The third, bottom-up, method uses a nonlinear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multi-talker visual speech recognition task of isolated letters. Now at, Human-Computer Interaction Institute, Carnegie Mellon University, Pittsburgh, USA
منابع مشابه
Improvement of lipreading performance using discriminative feature and speaker adaptation
In this paper, we apply a general and discriminative feature ”GIF” (Genetic Algorithm based Informative feature) to lipreading (visual speech recognition), and improve the lipreading performance using speaker adaptation. The feature extraction method consists of two transforms, which convert an input vector into GIF for recognition. In the speaker adaptation, MAP (Maximum A Posteriori) adaptati...
متن کاملImproved ROI and within frame discriminant features for lipreading
We study three aspects of designing appearance based visual features for automatic lipreading: (a) The choice of the video region of interest (ROI), on which image transform features are obtained; (b) The extraction of speech discriminant features at each frame; and (c) The use of temporal information to improve visual speech modeling. In particular, with respect to (a), we propose a ROI that i...
متن کاملComparing visual features for lipreading
For automatic lipreading, there are many competing methods for feature extraction. Often, because of the complexity of the task these methods are tested on only quite restricted datasets, such as the letters of the alphabet or digits, and from only a few speakers. In this paper we compare some of the leading methods for lip feature extraction and compare them on the GRID dataset which uses a co...
متن کاملA Multibiometric Speaker Authentication System with SVM Audio Reliability Indicator
Performances of biometric speaker authentication systems are good in clean conditions but their reliability drops severely in noisy environments. Implementation of multibiometric systems using audio and visual experts is one of the solutions to this limitation. In this study, weighting for fusing the audio and visual expert scores is proposed to be adapted corresponding to the current environme...
متن کاملSpeaker - Independent Visual Lip Activity Detection for Human - Computer Interaction
Recently there is an increased interest in using the visual features for improved speech processing. Lip reading plays a vital role in visual speech processing. In this paper, a new approach for lip reading is presented. Visual speech recognition is applied in mobile phone applications, human-computer interaction and also to recognize the spoken words of hearing impaired persons. The visual spe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Pattern Anal. Mach. Intell.
دوره 24 شماره
صفحات -
تاریخ انتشار 2002